The prognostic value of autophagy related genes with potential protective function in Ewing sarcoma

Background Ewing sarcoma (ES) is the second most common primary malignant bone tumor mainly occurring in children, adolescents and young adults with high metastasis and mortality. Autophagy has been reported to be involved in the survival of ES, but the role remains unclear. Therefore, it’s necessary to investigate the prognostic value of autophagy related genes using bioinformatics methods. Results ATG2B, ATG10 and DAPK1 were final screened genes for a prognostic model. KM and risk score plots showed patients in high score group had better prognoses both in training and validation sets. C-indexes of the model for training and validation sets were 0.68 and 0.71, respectively. Calibration analyses indicated the model had high prediction accuracy in training and validation sets. The AUC values of ROC for 1-, 3-, 5-year prediction were 0.65, 0.73 and 0.84 in training set, 0.88, 0.73 and 0.79 in validation set, which suggested high prediction accuracy of the model. Decision curve analyses showed that patients could benefit much from the model. Differential and functional analyses suggested that autophagy and apoptosis were upregulated in high risk score group. Conclusions ATG2B, ATG10 and DAPK1 were autophagy related genes with potential protective function in ES. The prognostic model established by them exhibited excellent prediction accuracy and discriminatory capacities. They might be used as potential prognostic biomarkers and therapeutic targets in ES. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04849-x.

et al. is 50.6%, lower than osteosarcoma and chondrosarcoma [1]. Notably, the 5-year overall survival rate of patients with metastases is lower than 30% and up to 20-25% of patients present with metastases at diagnosis [3]. Meanwhile, patients who relapse have a dismal prognosis, especially relapse within 2 years after diagnosis, the 5-year survival rate was even less than 10% [4]. Despite the progresses have been made in the treatment of ES, but the long-term survival rates of metastatic and relapsed patients have not improved significantly [2,3]. Therefore, how to further improve the prognoses of the ES patients remains to be further studied and efforts on finding new molecular targets and new therapies for the existing targets are still ongoing.
Autophagy related genes (ARGs) in human referred to the human genes described so far as involved in autophagy and 222 ARGs were collected in the Human Autophagy Database to date (http:// www. autop hagy. lu/). As is known that autophagy is a self-degradation process of cytoplasmic proteins and damaged organelles precisely regulated by ARGs, which can prevent cell damage, promote cell survival in the absence of nutrients, respond to cytotoxic stimuli and etc. [5,6]. It's mainly activated by endoplasmic reticulum stress, low energy, nutrient starvation, hypoxia and reactive oxygen species [5,6]. Obviously, autophagy acts as a self-protection mechanism of cells under physiological conditions [5,6]. However, this survival mechanism also can help tumor cells to survive and disseminate under stress conditions [5][6][7][8][9]. Besides, many drug resistances of cancers are associated with autophagy as well [10,11]. It's reported that autophagy plays a dual role in human health and diseases [5,6]. Generally, autophagy tended to function as a tumor suppressor mechanism in the early stage of tumorigenesis, while on the late stage of tumor, it can promote tumor proliferation, aggressiveness and metastasis [7][8][9]. Nevertheless, the controversial role of autophagy has also been reported in ES [12,13]. Then, what role does autophagy play in the prognosis of ES deserves further study. Thus, in this study, we intend to explore this issue by bioinformatic tools and bridge the gap in this area. The findings of it may also indicate potential prognostic biomarkers and therapeutic targets in ES.
Therefore, in this study we intend to preliminarily explore the association between potential protective ARGs that with better prognosis in high expression group in KM analysis and the prognosis of ES, so as to find some potential prognostic biomarkers and even therapy targets for ES. Currently, several prognostic models have been reported in ES [14][15][16][17]. They explored the prognostic value of genes related to some other processes in ES, whereas no model has been established by ARGs, together with their roles in the prognosis of ES have not been explored, either. Therefore, it's necessary to bridge the gap in this area. In addition, our study is also a supplement of the existing models, which can deepen our understanding of ES from a new perspective.

Clinical characteristics of the enrolled ES patients
Data of 64 and 46 ES samples were extracted from GSE17679 and GSE65155 respectively. The former was set as training set and the latter was set as validation set. The clinical characteristics of the two sets showed that the characteristics of age and gender were similar between the two sets (Table 1).

Potential protective ARGs screened by survival analysis
48 and 46 ARGs were screened by univariate (P value < 0.05) and multivariate Cox analyses (P value < 0.05), respectively (Additional file 1). 18 ARGs were retained by LASSO Cox regression analysis (λ = lambda.1se) (Fig. 1A, B). Then, KM analyses were performed for the 18 ARGs to select potential protective ARGs in training cohort (Figs. 1C, 2A-C). Results showed that ATG2B, ATG10, ATG12 and DAPK1 were ARGs with better prognosis in high expression group (P value < 0.05) ( Fig. 2A-C). Finally, ATG2B, ATG10 and DAPK1 were selected as final AGRs most related to survival with P value < 0.01 in univariate Cox analyses. Map of autophagy pathway in human (hsa04140) were downloaded from KEGG website. 18 ARGs screened by LASSO were mapped to the autophagy pathway map, only 9 were successfully mapped (the red and blue nodes in the map, the red nodes referred to the final ARGs) (Fig. 2D). As was shown in Fig. 2D, ATG2-WIPI complex was involved in the processes from initiation to elongation of the autophagosome directly or by promoting ATG12-ATG-ATG16 complex formation. ATG10 acted as a E2 for ATG12 in promoting ATG12-ATG5-ATG16 complex formation. DAPK could promote the phosphorylation of the Beclin-1 playing an important role in autophagy. Lastly, PPI network showed the interactions among the proteins encoded by genes screened by LASSO and it could be seen that proteins encoded by ATG2B, ATG10 and DAPK1 were important nodes in this network, which in turn reflected their key roles in autophagy (Fig. 2E).

Survival analysis for samples in different risk score groups
After risk scores were calculated and patients were divided into high and low risk score groups by median, KM analysis of the two group were performed. Result showed that there was significant difference between the low (blue curve) and high risk score (red curve) groups (P value = 0.00022) and patients in high score group got better outcomes in training set (Fig. 3A). The characteristics of samples from the two groups in training set were showed in Fig. 3B. The scatter plot in the middle of Fig. 3B showed that there were more blue dots in the upper left of the plot which means patients in high score group (left side of the plot) tended to have longer survival time, while more red dots in the bottom right of the plot which means patients in low score group (right side of the plot) tended to have shorter survival time. In sum, high risk score group tended to have better outcomes than low score group. Heatmap in Fig. 3B showed that ATG2B, ATG10 and DAPK1 tended to be highly expressed in high score group (Samples on the left side were high risk score group, right side were low risk score group. Blue color stood for low gene expression, red color stood for high gene expression in the heatmap).

Evaluation of the prognostic model in training set
A Cox proportional hazards model was established by ATG2B, ATG10 and DAPK1, then visualized by a nomogram (Fig. 4A). An ES patient's survival rate of 1-, 3-and Calibration analysis showed that the predicted 1-year overall survivals were close to the overall survivals observed and the predicted 3-,5-year overall survivals were in high agreement with the overall survivals observed, which indicated high accuracy of the model (Fig. 4B-D). Time-dependent ROC analysis showed that the 1-, 3-, 5-year AUCs were 0.65, 0.73, 0.84 respectively (Fig. 4E). The AUC of 1 year was a little bit low, but the 3-and 5-year AUCs of the model were excellent, which might imply a good long term predictive value for ES patients. Decision curve analysis (DCA) for 5-year prediction of the model indicated a higher net benefit than treat none and treat all plans (Fig. 4F).

Validating the model in validation set
Nomogram for validation set (GSE63155) was shown in Fig. 5A. ATG10, with the longest line in nomogram, played the most important role in the model, which was also seen in training set. KM plot (Fig. 5B) indicated that patients in high risk score group (red curve) got a significant better outcomes than that in low score group (blue curve) (P value = 0.0043). Meanwhile, this was also seen in the middle plot of Fig. 5C: more alive patients with long survival time in high score group (more blue dots in the upper left of the plot). Heatmap of Fig. 5C showed that ATG2B, ATG10 and DAPK1 tended to be highly expressed in high score group (Samples on the left side were high risk score group, right side were low risk score group. Blue color stood for low gene expression, red color stood for high gene expression in the heatmap). C-index of the prognostic model in training set was 0.71 (95%CI: 0.63-0.78), even higher than that in training set. Calibration analysis showed the predicted 1-, 3-,5-year overall survivals by the model were in agreement with the overall survivals observed, which indicated its high prediction accuracy in validation set ( Fig. 5D-F). Time-dependent ROC analysis showed that the 1-, 3-, 5-year AUCs were 0.88, 0.73, 0.79 respectively (Fig. 5G). The AUCs in the validation set were outstanding, they were even higher than that in training set, which suggested excellent prediction accuracy and discriminatory capacities of the model. Besides, DCA for 5-year prediction of the model also indicated a higher net benefit than the other two plans (Fig. 5H).

Differential expression analysis of the final genes
Differential expressions of ATG2B, ATG10 and DAPK1 were explored between different groups in training and validation sets ( Fig. 6A-I). Figure 6A showed that ATG2B  was highly expressed in ES cell lines and tumor tissues and there was significant difference between normal and tumor samples (P < 0.01), but there were no significant differences between cell line and normal groups, cell line and tumor groups. The expressions of ATG10 in tumor tissues was also higher than that in normal tissues (P < 0.05) (Fig. 6B), and it was extremely highly expressed in ES cell lines. Significant differences were found between cell line and normal, cell line and tumor groups (P < 0.0001). At the same time, differential expressions of DAPK1 in these groups were similar to ATG2B (Fig. 6C). In Fig. 6D-F, ES tumor tissue group was divided into high and low risk score groups. Obviously, the expressions of the 3 genes in high score group were significantly higher than that in low score group (P < 0.0001). Meanwhile, we should notice that compared to high score group, the expressions of ATG2B and ATG10 in low score group were closer to normal group, especially ATG10, the expressions of it were almost the same in the two groups (Fig. 6D, E). And compared to other groups, the expressions of the 3 genes in high score group were closer to the cell line group (Fig. 6D-F). Besides, compared to normal group, DAPK1 was significantly highly expressed in both high and low score groups (Fig. 6F). In the validation set, the expressions of ATG2B, ATG10 and DAPK1 were found to be significantly highly expressed in high score groups (p < 0.0001) (Fig. 6G-I).
Identification and functional analysis of the differentially expressed genes between high and low score groups 137 DEGs were screened between high and low score groups (Additional file 2), volcano plot was shown in Fig. 7A. GO clustering of the DEGs showed that the top 3 biological processes (BP) clustered were regulation of developmental growth, fat cell differentiation and amino acid transport, the top 3 cellular components (CC) clustered were presynapse, synaptic membrane, neuron projection terminus, the top 3 molecular functions (MF) clustered were transcription corepressor activity, syntaxin-1 binding, amino acid transmembrane transporter activity (Fig. 7B). The top 5 pathways clustered by KEGG were Neuroactive ligand-receptor interaction, Chemical carcinogenesis receptor activation, Oxytocin signaling pathway, Calcium signaling pathway, MicroRNAs in cancer (Fig. 7C). GSEA showed that the top 4 enriched gene sets: acute myeloid leukemia, alcoholic liver disease, cGMP-PKG signaling pathway, olfactory transduction were all down regulated in the high score group (Fig. 7D). GO (Fig. 7F). PPI network showed that EZH2 and SYNPR were hub genes involved in two independent processes respectively (Fig. 7G).

Discussion
ES is a malignant tumor with high mortality, especially in patients with metastasis and recurrence. Recent development in treatment of ES has significantly improved the longterm survival in localized ES patients, from 10% in the era before chemotherapy to about 70% currently (5-year survival rate) [18][19][20]. However, the survival rates of patients with metastasis and recurrence are still unacceptable low [2]. To date, accumulating evidences suggest a close relationship between autophagy and ES [21][22][23][24][25][26][27][28]. Here, our finding that ATG2B, ATG10 and DAPK1 are potential protective ARGs with prognostic value in ES is valuable. It not only can deepen our insight into this area but may also contribute to the treatment of ES in future.
In our study, ATG2B, ATG10 and DAPK1 were strictly screened as the final potential protective ARGs by univariate and multivariate Cox regression, LASSO (Fig. 1A, B), KM analysis (Figs. 1C, 2A-C), pathway map analysis (Fig. 2D) and PPI network analysis (Fig. 2E) for the prognostic model of ES. The prediction accuracy and discriminatory capacities were accessed in both training and validation sets by KM analysis, risk Differential analysis between high and low risk score groups and functional analyses of DEGs. A Volcano plot for DEGs between high and low risk score groups: x-axis referred to log 2 Fold change of gene expression between the two groups, y-axis referred to − log 10 (adjusted P value) of gene expression between the two groups. red dots stood for significantly up regulated genes and blue dots stood for significantly down regulated genes in high risk score group, grey dots stood for non differentially expressed genes. B Dot plot for GO analysis of DEGs: x-axis referred to gene ratio, y-axis referred to clustered GO process. C Dot plot for KEGG analysis of DEGs: x-axis referred to gene ratio, y-axis referred to clustered KEGG process. D GSEA analysis for DEGs: x-axis referred ranked samples by enrichment score, y-axis referred to running enrichment score (up) and ranked list metric (down). E Chord plot for top 7 clustered GO terms: left side of the chord was the top7 clustered GO process, right side of the chord was the top DEGs by fold change. F Chord plot for top 7 clustered KEGG pathways: left side of the chord was the top7 clustered KEGG process, right side of the chord was the top DEGs by fold change. G PPI network of the DEGs score plots, C-index, calibration analysis, time-dependent ROC analysis and DCA. KM analysis showed that high risk score group got a better outcome than low score group (P < 0.05) (Figs. 3A, 5B). Distribution characteristics of high and low risk score group samples also confirmed that high score group tended to have longer survival time (Figs. 3B, 5C). Expression heatmap of the hubgenes indicated their high expression in high score group (Figs. 3B, 5C). The expression trend of the hubgenes were coincident with the risk score in samples. C-index was 0.68 in training set and 0.71 in validation set, both indicated high prediction accuracy of the model. The more expression of the hubgenes, the less points and the higher survival rate a patient got in nomograms (Figs. 4A, 5A) also supported the potential protective function of the hubgenes in ES. Calibration analysis (Figs. 4B-D, 5D-F) and time-dependent ROC analysis (Figs. 4E, 5G) all indicated high prediction accuracy of the model both in training and validation set. However, we noticed that the prediction accuracy of 1 year in training set was lower than that of 3-and 5-year (Figs. 4A, E, 5E). We considered this kind of fluctuation was mainly associated with two reasons here. Firstly, sequencing accuracy and sampling error including small sample size. We could see that the quality of the data in training set was inferior to that in the validation set by density plots. Secondly, differential expression of the genes in different stages and subtypes of tumor and their weight of role in the disease evolution. It's true that some biomarkers worked well with long term survival, but with poor performance in short term survival prediction. The reliability of the model could be enhanced by improving the conditions above. In the end, DCA in training and validation sets indicated higher net benefits by the model than treat none and treat all strategy (Figs. 4F, 5H). In summary, the model established by the 3 potential protective hubgenes had high prediction accuracy and good applicability.
Moreover, the expression profiles of the 3 ARGs were explored in different groups in training and validation sets. First of all, we compared their expressions in normal tissues, ES cell lines and tumor tissues. Comparing to normal group, ATG2B, ATG10, DAPK1 were all high expressed in tumor and cell lines (Fig. 6A-C). Meanwhile, the expressions in cell lines even higher than that in tumor tissues, which might be mainly due to the higher purity of tumor cells in cell lines. Secondly, comparison of their expressions between high and low risk score groups showed that they were all highly expressed in high score group both in training and validation sets (Fig. 6D-I). The interesting thing was that when compared to normal and cell line groups, the expressions in low score group tended to be closer to that in normal group, which suggested that in vivo ES cells might gain more aggression by inhibiting autophagy. A potential explanation for this might be that autophagy could lead to programmed cell death in tumor cells while low autophagy might add to the accumulation of malignant mutations [5][6][7][8].
In addition, in order to further explore the potential reasons for the difference in prognosis of the two score groups, differential and functional analyses of the two groups were performed. GO clustering of DEGs suggested that DEGs were mainly involved in membrane and transmembrane transport, which could be activated in autophagy process (Fig. 7B, E). Meanwhile, we noticed that Calcium signaling pathway and MicroR-NAs in cancer were top 5 clustered pathways in KEGG (Fig. 7C, F). It had been reported that calcium signaling pathway was mainly involved in the regulations of apoptosis, autophagy and cell proliferation in cancers and any impairment to this function might result in low sensitivity to cell death inducers, thereby promoting tumor growth and metastasis [29]. While MicroRNAs were frequently considered as post-transcriptional regulators of gene expression and many key ARGs were also regulated by MicroRNAs [30,31]. In addition, we also noticed that cGMP-PKG signaling pathway was enriched by GSEA (Fig. 7D). cGMP-PKG signaling pathway could inhibit apoptosis by decreasing the activity of caspase 3. While in our study, cGMP-PKG signaling pathway was down regulated in high score group which meant that the inhibition of apoptosis was released in that group and this might be also one of the reasons for the better prognosis in high score group. PPI network analysis of DEGs got two hub proteins, which might also contribute to the prognostic difference in the two groups (Fig. 7G). Firstly, SYNPR, an intrinsic membrane protein of small synaptic vesicles, SYNPR may be involved in autophagy through membrane transport. It's also involved in the top GO and KEGG processes (Fig. 7E, F). Secondly, EZH2 was upstream protein involved in cancer initiation, progression, metastasis, metabolism, drug resistance and immunity regulation [32]. Over expression of EZH2 might associated with the metastasis and poor prognosis in ES, but the mechanism was still unclear and it's not involved in the top GO and KEGG processes in our study [33][34][35].
Above all, our results suggested ATG2B, ATG10 and DAPK1 were potential protective ARGs with prognostic value in ES. They might affect the prognosis of ES by autophagy and interactions with other genes and processes directly or indirectly.
Firstly, ATG2B, autophagy related 2B, encoded a protein required for autophagy, which was involved in autophagosome formation. Mutation of this gene was reported to be associated with predisposition to myeloid malignancies (NCBI Reference Sequences (RefSeq)) [36,37]. Besides, there were also many studies indicating that ATG2B was involved in many other tumors. Mi Ran Kang et al. proved that ATG2B mutation might contribute to cancer development by deregulating the autophagy process in gastric and colorectal carcinomas with high microsatellite instability [38]. Jiali Wei et al. revealed that cell proliferation was inhibited in non-small cell lung cancer cells by targeting ATG2B to inhibit autophagy [39,40]. Xuemei Zhang et al. found that methylation of ATG2B CpG island promoter might be associated with the initiation and progression of breast carcinoma [41]. Xiaoqing Bi et al. reported that upregulation of ATG2B could inhibit cell proliferation and lead to cell apoptosis in cutaneous squamous cell carcinoma cells [42]. There were also many reports indicating that ATG2B was associated with drug resistance by activating autophagy in multiple tumors [10,11]. Although ATG2B had not been reported in ES, but there had been plenty of reports that autophagy was associated with the prognosisi of ES [19][20][21][22][23][24][25][26]. In our study, ATG2B was found to be a potential protective factor in ES, the same as it's in gastric and colorectal carcinomas, breast carcinoma and cutaneous squamous cell carcinoma, but opposite to non-small cell lung cancer.
Secondly, ATG10, autophagy related 10, was an E2-like enzyme involved in the ubiquitin-like modification of ATG12-ATG5-ATG16 complex, which was essential for the elongation and maturation of autophagosomes. Meanwhile, ATG10 was also found to play important roles in several tumors. It's reported that potentially functional polymorphisms in ATG10 were found to be associated with risk of breast cancer and acute myeloid leukemia [43,44]. Hao Shen et al. declared that ATG10 could inhibit tumor migration and invasion by activating autophagy and apoptosis in papillary thyroid carcinomas [45]. Qing-Hua Cao et al. analyzed 352 patients by bioinformatics methods finding that ATG10 was a favorable prognostic factor for the overall survival in gastric cancer [46]. Yoon Kyung Jo et al. found that knock down of ATG10 promoted cell migration and invasion of colorectal cancer cells [47]. Kaipeng Xie et al. revealed that high expression of ATG10 leaded to short survival by facilitated tumor cell proliferation and migration in lung cancer. Herein, we found ATG10 was a favorable factor in the prognosis of ES. The same effect was also found in papillary thyroid carcinomas, gastric cancer, colorectal cancer, while the opposite effect was seen in lung cancer.
Lastly, DAPK1, death associated protein kinase 1, was a positive mediator of gammainterferon induced programmed cell death and a candidate for antioncogene (NCBI Ref-Seq). It's act as a tumor suppressor in multiple cancers, such as lymphomas [48], invasive ductal carcinoma [49], gastrointestinal cancer [50], gastric cancer [51], liver cancer [52], etc. It was reported to suppress tumor genesis and progress by promoting autophagy and apoptosis and this was in coincidence with the conclusion reached by our study [53,54].
Although the function of autophagy was complex and often contradictory in cancers, but there were many evidences supporting that upregulation of ATG2B, ATG10, DAPK1 might contribute to good prognosis by promoting autophagy and apoptosis in ES. Conglin Ye et al. reported that the proliferation, invasion and migration of Ewing sarcoma cells were decreased by knocking down Beclin-1 which was required for the formation of autophagosomes [23]. Séverine Lorin et al. found that 2-methoxyestradiol could treat ES patients by enhancing autophagy and apoptosis through the activation of both p53 and JNK pathways [26,27]. Mojgan Djavaheri-Mergny et al. demonstrated that autophagy might amplify apoptosis and stimulation of autophagy might be a potential way to bypass NF-kappaB induced drug resistance in ES [28].
Generally, ATG2B, ATG10, DAPK1 were found to be potential protective ARGs with prognostic value in ES and established a prognostic model. The model was successfully validated in an independent external cohort with excellent prediction accuracy and discriminatory capacities. The potential cause of the different prognoses between high and low score groups was that they might affect the apoptosis and malignance of ES cells through autophagy and crosstalk with other processes.
In the end, some limitations of this study should be noticed. For one thing, because ES is a rare tumor, data with large sample size are not available. For another thing, our conclusion is only validated in the existing public datasets and literature.

Conclusions
ATG2B, ATG10 and DAPK1 were autophagy related genes with potential protective function in ES. The prognostic model established by them exhibited excellent prediction accuracy and discriminatory capacities. They might be used as potential prognostic biomarkers and therapeutic targets in ES.

Data collection
The Gene Expression Omnibus (GEO) (https:// www. ncbi. nlm. nih. gov/ geo/) is an international public repository that provides microarray, next-generation sequencing and other forms of high-throughput functional genomics data submitted by the research community. In this study, we searched GEO website for datasets with RNA expression data of ES tissue specimens, corresponding survival data of patients and including at least 40 [56]. Meanwhile, GSE17679 also contained RNA expression data of 11 in vitro cultured Ewing sarcoma cell lines (GSM439930-GSM439940: Ewing sarcoma cell line 6647, IOR-BRZ, IOR-BRZ, IOR-CLB, IOR-NGR, IOR-RCH, LAP35, RDES, SKES1, SKNMC, TC71) and 18 normal muscle tissue RNA expression data measured by the same method [55]. RNA expression data from GSE17679 and GSE63155 had already been normalized by RMA method and quantile method, respectively. Quality control was performed by box plots (Additional file 3A-B) and density plots (Additional file 3C-D). Results showed that the expression data were well normalized and comparable in each set with high quality and no outlier. 222 ARGs were downloaded from the Human Autophagy Database (http:// www. autop hagy. lu/) (March 27 2022).

Identify ARGs with prognostic significance by univariate and multivariate Cox analyses and LASSO Cox regression analysis
Only common ARGs expressed both in GSE17679 and GSE63155 datasets were selected for survival analyses in case ARGs identified in training set could not be validated in validation set. In our study, GSE17679 cohort was set as training set, the other validation set. "survival" package was adopted to integrate survival data of samples for Cox analysis [57]. Then, univariate Cox regression analysis were used to screen genes with prognostic value. Genes with P value < 0.05 in univariate Cox regression analysis were further screened by multivariate Cox regression analysis (P value < 0.05). Thereafter, genes filtered by multivariate Cox regression analysis were further screened again by the least absolute shrinkage and selection operator (LASSO) Cox regression analysis with 10 times cross validation (λ = lambda.1se) [58].

Screen potential protective ARGs by KM analysis and explore their roles in autophagy
Then, KM analysis was used to explore the prognostic value of each retained gene by dividing patients into high and low expression groups with median expression of the gene. Genes with better prognosis in high expression group in KM analyses (P value < 0.05) and meanwhile with P value < 0.01 in univariate Cox regression analysis were selected as final potential protective ARGs for a prognostic model in ES. Autophagy pathway map was downloaded from KEGG website (https:// www. genome. jp/ kegg/) and protein protein interact (PPI) network analysis for proteins encoded by genes screened by LASSO were also employed to explore their roles in autophagy (interaction score ≥ 0.4).

Risk score calculation and survival analysis for samples in different groups
Risk scores for samples were calculated by sum up the risk score of each final ARG and risk score for a gene was calculated by multiplying its expression value by its coefficient value in multivariate Cox regression model (Formula: Risk score = n i=1 Exp gene i × coefficient gene i , Exp gene i referred to the expression of gene i, coefficient gene i referred to the coefficient value of gene i, n referred to the number of genes involved in the model). Then, samples were divided into low and high risk score group by median of the risk scores for samples. Thereafter, KM analysis was used to investigate the prognosis of the two groups. Meanwhile, scatter plots and heatmap were used to exhibit the distribution characteristics of samples and hubgenes in the two groups.

Establish a prognostic model and evaluate it in training set
Final potential protective ARGs were used to build a prognostic model. Nomogram was used to visualize the Cox proportional hazards model. Then, the prediction accuracy and discriminatory capacities were assessed by C-index, calibration analysis, time-dependent ROC analysis and DCA in GSE17679.

Validate the model in validation set
The prediction accuracy and discriminatory capacities of the model were also validated in GSE63155 by KM analysis, risk score plots, C-index, calibration analysis, time-dependent ROC analysis and DCA.

Differential expression analysis of the final ARGs
Differential expression analysis of the final genes was investigated in GSE17679 between normal tissue, ES cell line, low and high risk score groups by Wilcoxon method after removing batch effect by ComBat function of "sva" package and P value < 0.05 was considered statistically significant. Meanwhile, differential expression analysis was also performed in GSE63155 between low and high risk score groups but without removing batch effect (no batch effect in GSE63155).
Identification and functional analysis of the differentially expressed genes between high and low score groups "limma" package was used to identify the differentially expressed genes (DEGs) between the high and low risk score groups, Benjamini and Hochberg method was used to adjust the P value (adjusted P value < 0.05, |log 2 FC| > 1) [59]. Then, GO and KEGG clustering, gene set enrichment analysis (GSEA), PPI network analysis were preformed to explore their functional enrichment and interactions (interaction score ≥ 0.4).

Statistical analysis
In this study, R software v3.63 was used to process data and generate charts. PPI network analyses were explored on STRING website (interaction score ≥ 0.4) (https:// cn. string-db. org/) and visualized by Cytoscape software v3.7.1. Flexible statistical methods were adopted for the statistical analysis. The work flow of this study was shown in Additional file 4.